Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

8272609: Add string deduplication support to SerialGC #5153

Closed
wants to merge 10 commits into from

Conversation

D-D-H
Copy link
Contributor

@D-D-H D-D-H commented Aug 18, 2021

Hi team,

Please help review this change to add string deduplication support to SerialGC.

Thanks,
Denghui


Progress

  • Change must not contain extraneous whitespace
  • Commit message must refer to an issue
  • Change must be properly reviewed

Issue

  • JDK-8272609: Add string deduplication support to SerialGC

Reviewers

Reviewing

Using git

Checkout this PR locally:
$ git fetch https://git.openjdk.java.net/jdk pull/5153/head:pull/5153
$ git checkout pull/5153

Update a local copy of the PR:
$ git checkout pull/5153
$ git pull https://git.openjdk.java.net/jdk pull/5153/head

Using Skara CLI tools

Checkout this PR locally:
$ git pr checkout 5153

View PR using the GUI difftool:
$ git pr show -t 5153

Using diff file

Download this PR as a diff file:
https://git.openjdk.java.net/jdk/pull/5153.diff

@D-D-H D-D-H marked this pull request as draft August 18, 2021 02:24
@bridgekeeper
Copy link

bridgekeeper bot commented Aug 18, 2021

👋 Welcome back ddong! A progress list of the required criteria for merging this PR into master will be added to the body of your pull request. There are additional pull request commands available for use with this pull request.

@openjdk
Copy link

openjdk bot commented Aug 18, 2021

@D-D-H The following label will be automatically applied to this pull request:

  • hotspot-gc

When this pull request is ready to be reviewed, an "RFR" email will be sent to the corresponding mailing list. If you would like to change these labels, use the /label pull request command.

@openjdk openjdk bot added the hotspot-gc hotspot-gc-dev@openjdk.org label Aug 18, 2021
@D-D-H D-D-H marked this pull request as ready for review August 18, 2021 14:09
@openjdk openjdk bot added the rfr Pull request is ready for review label Aug 18, 2021
@mlbridge
Copy link

mlbridge bot commented Aug 18, 2021

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes mostly look okay.

But it's not obvious that string deduplication support is useful for SerialGC. The kinds of applications SerialGC is often used for don't really benefit from string deduplication. Can someone suggest a good use-case?

One possible benefit is that if Epsilon is the only collector not supporting string deduplication then we might be able to simplify the string deduplication tests, replacing the per-GC test descriptions with just one to require !Epsilon and otherwise use whatever the current GC happens to be.

src/hotspot/share/gc/serial/markSweep.hpp Outdated Show resolved Hide resolved
@D-D-H
Copy link
Contributor Author

D-D-H commented Aug 19, 2021

But it's not obvious that string deduplication support is useful for SerialGC. The kinds of applications SerialGC is often used for don't really benefit from string deduplication. Can someone suggest a good use-case?

In some serverless applications (such as AWS Lambda) scenarios, SerialGC is generally used as the default GC algorithm. If there are many string constructions in the application, I think string deduplication may be beneficial.

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some more tidying up. Sorry for not noticing a couple of these earlier.

#ifndef SHARE_GC_SERIAL_STRINGDEDUP_HPP
#define SHARE_GC_SERIAL_STRINGDEDUP_HPP

#include "gc/shared/stringdedup/stringDedup.hpp"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing here that requires this header. Rather, it is needed in the inline.hpp file and should be moved there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed

src/hotspot/share/gc/serial/serialStringDedup.hpp Outdated Show resolved Hide resolved
// Candidate selection policy for young during evacuation.
// If to is young then age should be the new (survivor's) age.
// if to is old then age should be the age of the copied from object.
static inline bool is_candidate_from_evacuation(oop java_string,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Argument is poorly named. The value might not be, and indeed is unlikely to be, a String. Part of this function's job is to check whether the argument is a String.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed. Use obj as instead.

// Candidate if string is being evacuated from young to old but has not
// reached the deduplication age threshold, i.e. has not previously been a
// candidate during its life in the young generation.
return SerialHeap::heap()->young_gen()->is_in_reserved(java_string) &&

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This depends on implicit includes since there's nothing here to ensure the type of young_gen() is in scope. I think it's coming from serialHeap.hpp, but that could be changed in the future to forward-declare the generation class rather than include the associated header. It might be better to have this function in a .cpp file (as it was in an earlier iteration) to limit header exposure in clients. This function isn't so performance critical, since before calling it we've already checked that deduplication is enabled and the argument is a String, so being inline isn't as important here as for is_candidate_from_evacuation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

@D-D-H
Copy link
Contributor Author

D-D-H commented Aug 20, 2021

@kimbarrett
Thanks for your careful review!
I have fixed the related problems.

Copy link

@kimbarrett kimbarrett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@openjdk
Copy link

openjdk bot commented Aug 20, 2021

@D-D-H This change now passes all automated pre-integration checks.

ℹ️ This project also has non-automated pre-integration requirements. Please see the file CONTRIBUTING.md for details.

After integration, the commit message for the final commit will be:

8272609: Add string deduplication support to SerialGC

Reviewed-by: kbarrett, iwalulya

You can use pull request commands such as /summary, /contributor and /issue to adjust it as needed.

At the time when this comment was updated there had been 4 new commits pushed to the master branch:

  • b690f29: 8269687: pauth_aarch64.hpp include name is incorrect
  • f77a1a1: 8272472: StackGuardPages test doesn't build with glibc 2.34
  • 04a806e: 8270344: Session resumption errors
  • d85560e: 8267161: Write automated test case for JDK-4479161

Please see this link for an up-to-date comparison between the source branch of this pull request and the master branch.
As there are no conflicts, your changes will automatically be rebased on top of these commits when integrating. If you prefer to avoid this automatic rebasing, please check the documentation for the /integrate command for further details.

As you do not have Committer status in this project an existing Committer must agree to sponsor your change. Possible candidates are the reviewers of this PR (@kimbarrett, @walulyai) but any other Committer may sponsor as well.

➡️ To flag this PR as ready for integration with the above commit message, type /integrate in a new comment. (Afterwards, your sponsor types /sponsor in a new comment to perform the integration).

@openjdk openjdk bot added the ready Pull request is ready to be integrated label Aug 20, 2021
@@ -601,6 +603,8 @@ void DefNewGeneration::collect(bool full,
// Verify that the usage of keep_alive didn't copy any objects.
assert(heap->no_allocs_since_save_marks(), "save marks have not been newly set.");

_string_dedup_requests.flush();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if placing _string_dedup_requests.flush(); before weak roots processing (WeakProcessor::weak_oops_do) makes more senses. The documentation says flush should be called when marking is done; for Serial young GC, it's equivalent to when evacuation is done, which is right after ref-processing.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That documentation is mistaken; that's a left-over from an earlier development version that I forgot to update. Without special additional care, the lifetime also needs to cover final-reference processing. I have a todo item to file a bug against that comment.

@@ -214,4 +216,5 @@ void MarkSweep::KeepAliveClosure::do_oop(narrowOop* p) { MarkSweep::KeepAliveClo
void MarkSweep::initialize() {
MarkSweep::_gc_timer = new (ResourceObj::C_HEAP, mtGC) STWGCTimer();
MarkSweep::_gc_tracer = new (ResourceObj::C_HEAP, mtGC) SerialOldTracer();
MarkSweep::_string_dedup_requests = new StringDedup::Requests();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not clear to me why _string_dedup_requests is a pointer. Is it possible to define static StringDedup::Requests _string_dedup_requests; directly? That way, Requests doesn't need to use the inheritance.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a non-local variable with a non-trivial destructor. See
https://mail.openjdk.java.net/pipermail/hotspot-gc-dev/2021-August/036412.html

@@ -112,6 +112,8 @@ void GenMarkSweep::invoke_at_safepoint(ReferenceProcessor* rp, bool clear_all_so

deallocate_stacks();

MarkSweep::_string_dedup_requests->flush();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly, flush should be called inside mark_sweep_phase1, right after ref-processing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your review and Kim's detailed explanation.

I think putting this invocation here or the position you suggest should have no effect on the result, so I tend not to make this modification.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it's desirable to narrow the request window (btw first add and flush), which offers more prompt string-dedup and memory release (in flush). Additionally, it makes more sense to call flush right after the complete live objects traversal (strong-marking + final-marking as part of ref-processing), since we will not visit any new objects then.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer it where it is. Rather than putting something between phantom
reference processing an oopstorage-based weak processing, I would prefer to
merge those two. I don't remember if there's an RFE for that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think flush should be called sooner, e.g. inside mark_sweep_phase1, which is not in conflict with merging phantom-processing and weak root processing. Anyway, this is subjective; feel free to integrate the current version.

@D-D-H
Copy link
Contributor Author

D-D-H commented Aug 23, 2021

@kimbarrett @walulyai @albertnetymk thanks for the review!

@D-D-H
Copy link
Contributor Author

D-D-H commented Aug 23, 2021

/integrate

@openjdk openjdk bot added the sponsor Pull request is ready to be sponsored label Aug 23, 2021
@openjdk
Copy link

openjdk bot commented Aug 23, 2021

@D-D-H
Your change (at version 077ca27) is now ready to be sponsored by a Committer.

@y1yang0
Copy link
Member

y1yang0 commented Aug 23, 2021

/sponsor

@openjdk
Copy link

openjdk bot commented Aug 23, 2021

Going to push as commit e8a289e.
Since your change was applied there have been 4 commits pushed to the master branch:

  • b690f29: 8269687: pauth_aarch64.hpp include name is incorrect
  • f77a1a1: 8272472: StackGuardPages test doesn't build with glibc 2.34
  • 04a806e: 8270344: Session resumption errors
  • d85560e: 8267161: Write automated test case for JDK-4479161

Your commit was automatically rebased without conflicts.

@openjdk openjdk bot closed this Aug 23, 2021
@openjdk openjdk bot added integrated Pull request has been integrated and removed ready Pull request is ready to be integrated rfr Pull request is ready for review sponsor Pull request is ready to be sponsored labels Aug 23, 2021
@openjdk
Copy link

openjdk bot commented Aug 23, 2021

@kelthuzadx @D-D-H Pushed as commit e8a289e.

💡 You may see a message that your pull request was closed with unmerged commits. This can be safely ignored.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
hotspot-gc hotspot-gc-dev@openjdk.org integrated Pull request has been integrated
5 participants